Common Dissimilarity Measures are Inappropriate for Time Series Clustering

نویسندگان

Cássio M. M. Pereira

Rodrigo Fernandes de Mello

چکیده

Clustering algorithms have been actively used to identify similar time series, providing a better understanding of data. However, common clustering dissimilarity measures disregard time series correlations, yielding poor results. In this paper, we introduce a dissimilarity measure based on series partial autocorrelations. Experiments compare hierarchical clustering algorithms using the common dissimilarity measures, such as Euclidean Distance and Dynamic Time Warping, to cluster time series following Box-Jenkins Auto-Regressive models. Results show that our dissimilarity measure produces better results for both synthetic and real data sets in terms of the Adjusted Rand Index and Normalized Hubert Γ statistic. Our findings confirm that the choice of dissimilarity measure is crucial for improving time series clustering quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Comparing Dissimilarity Measures: A Case of Banking Ratios

The aim of this paper is twofold. Firstly, to discuss a clustering of a given set of the European banks into groups based on their performance during 1999–2013. Secondly, to compare different dissimilarity measures and to determine which of them suits best for clustering banking ratios. Six ratios that reveal profitability, efficiency, stability and loan portfolio quality of the banks were used...

متن کامل

Clustering Symbolic Time-Series using L-tuples

Among the many dimensionality reduction methods for timeseries data, Symbolic Aggregate approXimation (SAX) is perhaps the most popular due to its simplicity and uniqueness. With SAX, time-series data can be represented as string sequences which enables the utilization of methods found in text mining and bioinformatics to enhance data mining tasks. We propose an application of L-tuples to impro...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

A new approach for clustering gene expression time series data

Identifying groups of genes that manifest similar expression patterns is crucial in the analysis of gene expression time series data. Choosing a similarity measure to determine the similarity or distance between profiles is an important task. This paper proposes a suitable dissimilarity measure for gene expression time series data sets. It also presents a graph-based clustering method for findi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

RITA

دوره 20 شماره

صفحات -

تاریخ انتشار 2013

Common Dissimilarity Measures are Inappropriate for Time Series Clustering

نویسندگان

چکیده

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Comparing Dissimilarity Measures: A Case of Banking Ratios

Clustering Symbolic Time-Series using L-tuples

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

A new approach for clustering gene expression time series data

عنوان ژورنال:

اشتراک گذاری